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Abstract 

Background: Nowadays, more and more clinical scales consisting in responses given by the patients to some 
items (Patient Reported Outcomes - PRO), are validated with models based on Item Response Theory, and more 
specifically, with a Rasch model. In the validation sample, presence of missing data is frequent. The aim of this 
paper is to compare sixteen methods for handling the missing data (mainly based on simple imputation) in the 
context of psychometric validation of PRO by a Rasch model. The main indexes used for validation by a Rasch 
model are compared. 

Methods: A simulation study was performed allowing to consider several cases, notably the possibility for the 
missing values to be informative or not and the rate of missing data. 

Results: Several imputations methods produce bias on psychometrical indexes (generally, the imputation methods 
artificially improve the psychometric qualities of the scale). In particular, this is the case with the method based on 
the Personal Mean Score (PMS) which is the most commonly used imputation method in practice. 

Conclusions: Several imputation methods should be avoided, in particular PMS imputation. From a general point 
of view, it is important to use an imputation method that considers both the ability of the patient (measured for 
example by his/her score), and the difficulty of the item (measured for example by its rate of favourable 
responses). Another recommendation is to always consider the addition of a random process in the imputation 
method, because such a process allows reducing the bias. Last, the analysis realized without imputation of the 
missing data (available case analyses) is an interesting alternative to the simple imputation in this context. 



Background 

Patient Reported Outcomes (PRO) nowadays are com- 
monly encountered in clinical research to take into 
account important unobservable characteristics. They 
are used for evaluating endpoints that cannot be directly 
observed and measured, such as Health Related Quality 
of Life (HR-QoL), anxiety, depressive symptoms, fatigue, 
addictive behaviors... Usually, patients respond to a 
questionnaire containing several items, with binary or 
ordinal responses, and the responses are often combined 
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to give scores. The idea of clinical research is usually to 
compare two or more groups of patients on different 
outcomes that can be, for instance, PRO. 

Two main types of analysis can be used to handle 
such data: Classical Test Theory (CTT) and Item 
Response Theory (IRT). In CTT, the observed scores 
are assumed to be a good representation of the "true" 
score. An alternative analysis consists in using IRT mod- 
els, in which the responses to the items are modelled as 
a function of a latent variable. This variable is consid- 
ered to be the ability measured by the questionnaire (e. 
g. Health Related Quality of Life, anxiety...). Among the 
IRT models, the Rasch model [1] is the most popular, 
when all the items have dichotomous responses. Indeed, 
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this model is the most simple one, and allows the deri- 
vation of a scale with interesting psychometrical proper- 
ties. In particular, it is possible to show that the 
estimations of the latent trait with this model are inde- 
pendant of the retained items. This property of specific 
objectivity allows the derivation of comparable measures 
of the latent trait with different versions of the question- 
naire (for example, short or long version, with or with- 
out missing values....). As a consequence, there are 
compelling arguments when validating a scale, to retain 
only those items which show a good fit to a Rasch 
model [1]. 

Several indexes allow testing the fit of the Rasch 
model. As for all the models of IRT, the Rasch model 
relies on three fundamendal assumptions: undimension- 
ality, local independence and monotonicity. The check 
of these assumptions can be realized using Loevinger's 
H coefficients [2,3], and in particular, the scalability 
coefficient H. More specifically, the fit of the Rasch 
model can also be considered. Among the fit tests that 
have been proposed, the Ql test [4] is one of the most 
popular. However, the study of the fit of the Rasch 
model can only be considered if the parameters of this 
model are unbiased (parameters characterizing the 
items, and the parameters of the distribution of the 
latent variable, since only global measure on the sample 
will be generally used in clinical research). Last, the fia- 
bility of the measure of the latent trait by an IRT model 
can be evaluated by the Personal Separation Index (PSI) 
[5]. This index is close, in its interpretation, to the 
Cronbach's alpha [6], which is a well-known index of 
reliability in CTT. In the framework of PRO, it is fre- 
quent to have a non negligible rate of missing data, 
which are often non ignorable, because there might be a 
link between the measured latent variable and the prob- 
ability of missingness of a response: for instance, 
patients with worse levels on the latent variable are 
more likely to have missing responses than other 
patients [7]. For example, in the case of HR-QoL, 
patients with a poor quality of life might be too tired to 
respond to a question or to achieve their questionnaire. 
This phenomenon can differently influence all the items: 
some items can be more affected by a large rate of miss- 
ing data, such as items that deal with a topic that might 
be difficult to express for the patient. As a consequence, 
the dataset might contain more information on the 
patients with a good level on the latent trait, as com- 
pared to patients with a poor level, introducing bias into 
the subsequent analysis. 

For this reason, it is important to take into account 
the occurence of missing data and the possibility of an 
underlying mecanism of missingness when analysing the 
dataset. Many authors suggest to replace the missing 
data by the most probable result: this process is called 



single (or simple) or multiple imputation [8]. Data are 
then analysed using these imputed values. Several meth- 
ods have been proposed to impute missing responses to 
items, depending on assumptions made on the missing 
data mechanism. The most popular method for PRO 
consists in imputing a missing value by the mean 
response of the patient to the other items. Such a 
method is clearly recommended in scoring manuals of 
widely used questionnaires such as SF-36 and QLQ-C30 
for instance [9-11]). However, it is well-known that this 
type of method might be inadequate [12-14], especially 
when the rate of missing data is high [15]. 

Nevertheless, such simple imputation methods have 
been rarely compared in the framework of psychometric 
validation of PRO questionnaires, especially when an 
analysis by IRT is planned. Among the few papers on 
this topic, [16] and [17] compared only a small number 
of methods, for bias in the estimation of Cronbach's 
alpha and Loevinger's H coefficient. Sijtsma and van der 
Ark [17] also considered the fit of the Rasch model. 
However, the problem of the potential bias on the esti- 
mation of the parameters of this model is more impor- 
tant to consider in the first place, because the fit cannot 
be correctly evaluated with biased parameters. 

These two papers focused only on a small number of 
methods. Moreover, their finding are difficult to com- 
pare because different methodologies were used to 
simulate the missing data. Furthermore, the impact of 
the imputation methods on the bias in the parameters 
of parametric IRT models remains unknown. We there- 
fore evaluated the impact of sixteen different methods 
for handling missing values in the framework of the 
Rasch model on (i) the bias of commonly used indices 
for evaluating the fundamental assumptions of IRT 
(Loevinger's H coefficient), (ii) the bias on the estimated 
parameters of the Rasch model, (iii) the bias on a fit test 
statistic, (iv) the bias on the measure of the fiability of 
the estimation of the latent trait (PSI). These parameters 
were chosen because they are the most important para- 
meters for validating a Rasch model. 

All these investigations were carried out using a simu- 
lation study. Such studies can contribute to give more 
insight from what is known from statistical theory that 
often provides asymptotic results. Indeed, simulations 
can be used to reflect real-life situations encountered in 
practice that can be of interest to applied researchers 
(various sample sizes, number of items...). Furthermore, 
simulation studies can help assessing the suitability and 
precision of different statistical models and in particular 
the bias in the parameter estimates in relation to a 
known simulated truth. 

We performed a simulation study to evaluate the bias 
on these parameters or indices, according to the chosen 
method for handling missing values, the rate of missing 
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values, and whether the missing data were ignorable 
or not. 

Methods 

Notation 

Let 

♦ X n j be the dichotomous variable representing the 
response of the «th individual (n = 1...N) to the /th 
item (j = 1...J) and x n j its realization [x n j = 0 denotes 
the more negative response to the /th item and x n j = 
1 the positive response] 

♦ D n j be a dummy variable taking the value 1 if 
observed and 0 otherwise and d„j its realization. 

♦ O n be the set of observed responses for the «th 
individual 

♦ Mj be the set of observed responses for the /th 
item 

♦ °n - Y?j=i d n j be the number of observed responses 
for the «th individual 

' m j = Yln=i dnj De t ne number of observed responses 
for the /th item 

♦ S„ = J2jeO„ x "i De the score of the «th individual 
(number of positive non-missing responses) 

♦ Tj = J^neMj x nj be the number of positive non-miss- 
ing responses to the /th item 

♦ x* n j be the possibly imputed value used in the analy- 
sis for x n j (note that x* n j = Xnj if d nj = 1) 



Simulation design with (non)informative missing data 

Item Response Theory (IRT) [18] is a set of models that 
allows measuring a latent variable 0 that influences the 
responses to the items. Three assumptions govern these 
models: 

♦ Unidimensionality: only one latent trait influences 
the responses to all the items, 

♦ Local Independence: for a given individual, the 
responses to the items are independent, 

♦ Monotonicity: the probability of giving a positive 
response to a given item does not decrease with the 
latent variable. 

0 is usually considered as a random variable and 0„ 
represents the latent trait of the wth patient. For each 
patient, the probability of responding to each item is 
computed according to a specific IRT model, the Rasch 
model [1]: 



where x n j = 0 for a negative response and x nj = 1 for a 
positive response. 8i is named the difficulty parameter of 
the /th item, because the higher its value, the lower the 
probability of positive response. We consider the latent 
variable as a random variable following a normal distri- 
bution with unknown parameters n and cr 2 . This implies 
that the sample is representative of the underlying 
population. Using the Local Independence assumption 
underlying Item Response Theory (IRT), the marginal 
likelihood is expressed as 



-.wA)-n/n^ 



'-, exp (x nj (6 - Sj)) 
exp (9 - Sj) 



G(&/ii,a 2 )de (2) 



with G(0/^, <7 2 ) the normal distribution function. 
Note that the Rasch model can be considered as a Gen- 
eralized Linear Mixed Model with a logistic function as 
link function. 

We estimate Sj(j = 1, J), /4 and cr 2 by maximizing 
this marginal likelihood [1]. The integral can be 
approximated with Gauss-Hermite quadratures. An 
identifiability constraint must be defined, and generally, 



P(X n 



j/0n;Sj) 



exp{x„j[9 n -Sj)) 
1 + exp(6>„ - Sj) 



fj, = 0 is used, but 5jj =1 Sj = 0 can also be used. Let 
v = Ylj=i ~ A- V is an estimable parameter, meaning 
that its estimation is independent of the chosen iden- 
tifiability constraint. In the present paper, the chosen 
indentifiability constraint is jl = 0 and consequently, a 
bias on the v parameter represents a global bias on the 
Sj parameters. 

Three missing data mechanisms have been described by 
Rubin [19]: missing completely at random (MCAR), miss- 
ing at random (MAR), and missing not at random 
(MNAR). For instance, in case of a self-reported HR-QoL 
questionnaire, data can be considered MCAR if the prob- 
ability of missing data (missing response on one or more 
items for instance) is independent of the patient's HR- 
QoL. Data will be considered MAR if the probability of 
missing data may depend on covariates describing the 
patients or on items characteristics [13,17]. In contrast, 
data will be considered MNAR if the probability of missing 
data depends on the patient's (unobserved) HR-QoL. 

Data are simulated according to these three mechan- 
isms, following a methodology already proposed by 
Sebille et al. [20] and close to the one used by Holman 
and Glas [21] for exploring ignorability of the missing 
data. More precisly, a latent variable noted if is used, 
corresponding to non-response propensity which repre- 
sents the tendency of non-response, which varies 
between individuals. This latent variable may be influ- 
enced by the value of the patient's latent trait 0 (HR- 
QoL, fatigue,...) and may thus involve a non-ignorable 
^ non response framework corresponding to MNAR data. 
To simulate the missing values, we assume that each 
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patient has a non-response propensity to each item 
represented by the latent variable The realization of f 
for the Mth individual is denoted £„. 

Let p = Corr(&, £), w a dummy variable (coded 0 or 1) 
representing the link between the presence of missing 
data and the difficulty of the items {Sj, j = 1. ..J), n be the 
expected rate of missing values for each item and n n be 
the probability for the «th patient to have a missing 
value to each item. This probability is assumed to have 
a lower bound equal to 1% and to be centred on n. 



Personal Mean Score - PMS and PMS-R 

One of the most commonly used methods of imputation 
in PRO is the Personal Mean Score (PMS) method 
which involves imputing a missing value using the aver- 
age score of the individual on the observed responses 
(rounded to the nearest integer) [16,17]. This method is 
used for example for the SF36, which is one of the most 
popular generic questionnaires of HR-QoL [10,11] or 
for the QLQ-C30 [9] which is a questionnaire of HR- 
QoL in Oncology. 



Wj, P{D nj = 0) = tt„ = 0.01 + [In - 0.02)- n 

1 + e*" 



(3) 



According to the value of p and w, different missing 
data mechanisms could be simulated: for p = 0 and w = 
0, the missing data will be MCAR, for p = 0 and w = 1, 
they are MAR, and for p * 0, the missing data are con- 
sidered as MNAR. We assume that a patient with a low 
level on the latent trait (low level of HR-QoL for 
instance) has a higher propensity to fail to respond to 
the items, so p is assumed to be less than or equal to 0. 

Data were simulated with three different values for p: 
p = 0 (MCAR or MAR data according to the value of 
w), p = -0.4 (MNAR data with low level of informativity 
of the missing data) and p = -0.9 (MNAR data with 
high level of informativity of the missing data). 

A thousand replications were simulated, each with 500 
individuals. Five items were used and the difficulty para- 
meters were fixed to -1, -.5, 0, .5 and 1. The values of 
0 n and £,„ were drawn from a standardized normal dis- 
tribution. Consequently, in all the simulations, 
v = E/=i ■ — M = 0. Three values have been considered 
for m 10%, 20% and 30%. 

We first simulated complete datasets, then created 
missing values by the process described above. 

Methods for handling missing data in the 

framework of IRT 

Wo imputation - NOIMP 

NOIMP is not an imputation method. It consists in 
treating all observed data. This method is often referred 
to as "available case analysis". 
Listwise Deletion - LD 

LD is not an imputation method either [17]. It consists 
in omitting the individuals with one or more missing 
values. This method is often referred to as "complete 
case analysis". 
Worst case - WORST 

WORST is a method which consists in substituting the 
"worst" results to all the missing data. Often, the more 
negative result is coded 0 (negative response), thus: 



x nj 



0 if d n 



0 



(4) 



.v,, ( = round [ — ) if </,,, 



0 



(5) 



In the PMS-R method, x*j is randomly drawn from a 



Bernoulli distribution with parameter p 



Item Mean Score - IMS and IMS-R 

This method consists in imputing a missing value with the 
item mean score (rounded to the nearest integer) [16]. 



x*j = round ( — ) if d„j = 0 



(6) 



In the IMS-R method, x„j is randomly drawn from a 

Tj 

Bernoulli distribution with parameter p = — . 

nij 

Corrected Item Mean - CIM and CIM-R 

PMS only takes into account the ability of the individual, 
and IMS only takes into account the difficulty of the item. 
The Corrected Item Mean method is a combination of 
these two methods: the item mean score is weighted by 
the personal mean score of the individual [16]. 



x* n j = round 



\ 



Sn/On 



Tk nij 



if d n j = 0 



(7) 



In the CIM-R method, *1 is randomly drawn from a 



Sfi/ o n 



Bernoulli distribution with parameter 



E, 



Tk nij- 



Item Correlation substitution - ICS 

This method has two steps: (i) searching for the more 
correlated item to each item, (ii) if the response of the 
Mth individual to the /th item is missing, we replace it 
by the response of this individual to the most correlated 
item to the /th item [16]. 



^nj ~ Xnk if dnj — 0 

with 

k = arg max Corr(Xj,Xi) 



(8) 



(9) 
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with Xj the variable representing the responses to the 
;'th item (j = !.../). 
Logistic model - LOG and LOG-R 

This method consists in fitting a logistic model to each 
item with missing values, with the other items as covari- 
ates [22]. A stepwise selection procedure is subsequently 
used to iteratively select the items that are significantly 
related to the missing item, as assessed by the likelihood 
ratio test. 

That is, for an item j with missing values, the follow- 
ing final model is fitted with the items, assuming items 
k, k e K have been selected with the stepwise procedure 
(K is the set of the indices of the selected items, j <t K): 

bgit{pnj) = Po + ^2 PkXnk + £«; Vn = 1, N ^ 

keK 

Where Pnj = P (X nj = 1) and logit{p) = log (y^r)- 

In the LOG method, *L is obtained by rounding the 
obtained probability, and in the LOG-R method, %* n j is 
randomly drawn from a Bernoulli distribution using this 
probability as its parameter. 
Mokken model - MOK 

The imputation by the Mokken model [16,23] consists 
in substituting the missing data by the most probable 
values in order to obtain a responses pattern which pro- 
duces the fewer Guttman errors as possible (a Guttman 
error is produced when an individual negatively 
responds to a given item, and positively responds to a 
more difficult item). For example, if a large proportion 
of the sample endorses item A and only a small propor- 
tion endorses item B, it is consider inconsistentto have 
an individual who endorses item B, but not item A. 

If the items are ordinated from the most prevalent 
item to the least prevalent, a coherent vector of 
responses for a given individual is composed of Is 
then of Os, for example (1,1,1,0,0) or (1,0,0,0,0). The 
algorithm used for imputation is described here: 

1. The items are sorted as a function of the number 
of positive responses to each of them, from the 
easiest item (item with the largest amount of positive 
responses) to the most difficult one. 

2. For every missing data the following five rules are 
applied: 

(a) If a positive response follows the missing 
response, impute the value 1. 

(b) If not, then if a negative response precedes 
the missing response, impute the value 0. 

(c) If not, then define a 00 as the number of nega- 
tive responses preceding a missing response, and 

the number of positive responses preceding 



a missing response. If a 00 > « 0 i impute the 
value 0. 

(d) If not, then define ct\o as the number of 
negative responses following a missing response, 
and flu as the number of positive responses fol- 
lowing a missing response. If a w < «n impute 
the value 1. 

(e) In all the other cases impute a random draw 
from the empirical distribution of the dichoto- 
mous items, based on their proportion of positive 
responses. 

Rasch model - RAS, RAI and RAS-R 

The imputation by the Rasch model consists in susbitut- 
ing the missing values using the rounded value of the 
probability of obtaining a positive response predicted by 
the Rasch model: 

exp (§„ - Sj) 

Pnj = V -\ (U) 

1 + exp le„ - Sj) 

In the RAS method, %* n j is obtained by rounding pnj, 
and in the RAS-R method, ^ is randomly drawn from a 
Bernoulli distribution using p n j as its parameter. 

These two methods are implemented in the OPLM 
software [24] to impute missing data in the One Para- 
meter Logistic Model [25], of which the Rasch model is 
a particular case. 

In the RAI (Iterative Rasch model) method, we substi- 
tute the missing data by the RAS model, and then rees- 
timate the parameters of the Rasch model with the 
subsituted values leading to a second substitution. This 
process is repeated untill two successive iterations give 
exactly the same substituted values. The algorithm is 
generally stopped at the 10th iteration. 
Summary table 

Table 1 summarizes for each method whether it takes 
into account the ability of the individual, the difficulty 
of the item, the possibility of a random process or a 
likelihood based approach (when the imputation is 
based on a statistical model where the parameters are 
estimated by a maximum likelihood method). 
Note on the imputation process 

Imputation of missing data is only carried out for indivi- 
duals having more than 50% non-missing data (at least 3 
responses among the 5 items). This restriction is com- 
monely used in practice, for example for the SF-36 and 
QLQC30 questionnaires [9,10] and Sijtsma and van der 
Ark [17] suggest that this yields more stable results. 
Note that for the analysis, the individuals with more 
than 3 missing items are not omitted but only their 
observed responses have been used. 



Hardouin et al. BMC Medical Research Methodology 201 1, 1 1 :1 05 
http://www.biomedcentral.eom/1 471-2288/1 1 /1 05 



Page 6 of 13 



Table 1 Summary table of the characteristics of the 
imputation methods used to handle missing data 



Method 



Ability of 
the 



Difficulty 
of the 



Addition of a 
random 



Likelihood 
based 





individual 


item 


process 


approach 


NOIMP 










LD 










WORST 










PMS 


X 




X 




IMS 




X 


X 




CIM 


X 


X 


X 




ICS 


X 


X 






LOG 


X 


X 


X 


X 


MOK 


X 


X 






RAS 


X 


X 


X 


X 


RAI 


X 


X 




X 



For the 1000 simulated datasets, using this restriction, 
imputation could not be performed for an average of 6.0 
individuals (over 500 individuals — 1%) when n = 10%, 
of 37.3 individuals (over 500 individuals — 7%) when n 
= 20% and of 97.2 individuals (over 500 individuals — 
19%) when n = 30%. 

We note that with the ICS, LOG and LOG-R meth- 
ods, imputation might not be possible in some cases: 

• for ICS, if the most correlated item (of an item 
presenting a missing response) is also missing, 

♦ for LOG(-R), if the logistic model used to fit a 
missing response includes covariates with missing 
values. 



Studied parameters 

We evaluate the impact of the chosen method to handle 
missing dataon different parameters. 
Scalability index 

Loevinger's H coefficient [2] is used in non parametric 
Item Response Theory [3], and measures the scalability 
of a questionnaire. It can be defined as 



11 : 



E j¥ kEkCov{x p x k ) 



(12) 



with Cov(Xj, X k ) the covariance between the items / 
and k, and Cov {0) {X p X k ) the maximum possible covar- 
iance between these two items with fixed marginal 
frequencies. 

Parameters of the Rasch model 

We studied the bias in different ways: the bias in esti- 
mating the v = Ej=i ~ parameter, the bias in esti- 
mating the variance of the 8j parameters and the 
bias in estimating the variance of the latent trait (a 2 ). 



A positive bias on v for instance signifies that the 
latent trait is underestimated (or that the difficulty para- 
meters of the items are globally overestimated) and cor- 
responds to an optimistic result. 

The variance of of the 8 parameter is defined by 



n 



i 5 , 



n 



with Sj the mean on the 1000 replication of the esti- 
mations of the 8j parameters. A positive bias on this 
parameter signifies that the dispersion of the difficulty 
parameters is overestimated. 

The variance parameter of the latent trait a repre- 
sents the dispersion of the latent trait. 
Fit of the Rasch model 

In order to evaluate the impact of the imputation meth- 
ods, we investigated the fit test statistic Qi [4]. In this 
test, we compared for each score the positive responses 
to each item with the frequencies expected under the 
Rasch model assumption. Under the null assumption, 
the statistic follows a chi-square distribution. In this 
study, we evaluated on the 1000 replications of each 
case, the rate of rejection of the null assumption "Fit of 
the Rasch model". This estimation allows evaluating the 
type-I error of this fit test. It is expected that the rate of 
rejection of the null assumption will be close to 5% 
(because the former datasets are simulated with a Rasch 
model). If the 95% confidence interval does not contain 
the value 5%, the corresponding imputation method 
does not allow maintaining the type-I error to its 
expected level. 

Reliability of the estimation of the latent trait 

The Personal Separation Index (PSI) is a measure of the 
reliability of the scale. It can be computed as 



PSI = 1 



Var(0) 



where Var{§) is evaluated by 
1 N 

Var(§) = -J2 [s.e.{6n)] 2 



(13) 



(14) 



with s.e.(9 n ) being the evaluated standard error of the 
estimation of the 0 n parameter. 
Biases on the parameters 

For Loevinger's H coefficient (H) and Personal Separa- 
tion Index (PSI), the biases in estimating these para- 
meters are computed by comparing the estimation for 
each replication to the corresponding estimation 
obtained with complete datasets. For these two estima- 
tors, if *P is the random variable representing the 
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estimator, we denote (/// the estimation obtained for the 
Ith replication and \jr^ the corresponding value with 
the full dataset. Then, 



E1000 , i 
1=1 m 



(15) 



1000 



For v, of and a 2 , the bias is computed by comparing 
the average of the estimations obtained on the 1000 
replications to the values used in the simulation design 
(0 for v, 0.5 for and 1 for a 2 ). 

The bias is considered as negligible if it is lesser than 
0.05 for H and PSI, lesser than 0.1 for v and lesser than 
0.2 for af. For Q lt the bias is considered as negligible if 
the 95% confidence interval of the rate of rejection of 
the assumption "H 0 : fit of the Rasch Model" contains 
the value 5%. For o 2 , the bias is considered as negligible 
if the estimation is included in the interval [0.71; 1.37] 
that contains 95% of the estimations of O 2 obtained with 
the full datasets. Since the bias on a is computed as 

Eioc 
1=1 



1000 -2 



\, it is considered as large if it is lesser than 
1000 

0.71 - 1 = -0.29 or greater than 1.37 - 1 = 0.37, and 
small otherwise. 



Software 

All analyses were done using Stata software. Loevin- 
ger's H was computed with the -loevh- command [26] 
(using the pairwise option), and the parameters of the 
Rasch model were estimated with -raschtest- [27] 



commands. The simulations were carried out with the 
-simirt- module. Three Stata modules (-imputeitems-, 
-imputerasch- and -imputemok-) were written to 
impute the missing data. All these Stata modules can 
be downloaded from the website of the first author 
http://www.anaqol.org. 

Results 

The results given in this section are based on the mean 
results of the 1000 replications of each case. Formal sta- 
tistical tests have been carried out to determine poten- 
tial Ji and p effects for each imputation method on the 
bias of each studied parameter. In the event, all the tests 
were statistically significant, which raises the proble- 
matic issue of the distinction between statistically signif- 
icant results and meaningfull results or results of 
practical importance. This is why, the above mentioned 
thresholds are proposed to help determine small and 
large bias. 

The standard errors of the evaluations of all para- 
meters have been computed, but, since they remained 
very stable whatever the values of ji, p and the missing 
data mechanism (MCAR, MAR, MNAR), they were not 
included in the tables. 

Tables 2 to 7 present respectively the bias in estimat- 
ing the Loevinger's H coefficient (table 2), the v 
(table 3), (table 4) and <7 2 (table 5) parameters, the 
rate of rejection of the Rasch model by the Ql test 
(table 6) and the bias in estimating the PSI (table 7), for 
all the studied values of the w, ji and p parameters. 



Table 2 Bias on the Loevinger's H coefficient as a function of the rate of missing data per item (77), the value of the 
correlation coefficient p between the latent variable 0 and the propensity to have missing data c for each method for 
handling missing data (results for w = 0/w = 1) 

n = 10% n = 20% it = 30% 



Method 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


PMS 


0.08/0.08 


0.08/0.08 


0.08/0.07 


0.14/0.13 


0.14/0.13 


0.14/0.13 


0.18/0.17 


0.18/0.17 


0.17/0.16 


PMS-R 


0.04/0.04 


0.04/0.04 


0.04/0.04 


0.07/0.08 


0.07/0.07 


0.07/0.07 


0.10/0.10 


0.09/0.09 


0.09/0.09 


IMS 


-0.02/-0.02 


-0.02/-0.02 


-0.02/-0.02 


-0.03/-0.03 


-0.04/-0.04 


-0.04/-0.04 


-0.04/-0.04 


-0.05/-0.05 


-0.06/-0.06 


IMS-R 


-0.04/-0.04 


-0.04/-0.04 


-0.04/-0.04 


-0.07/-0.07 


-0.07/-0.07 


-0.08/-0.08 


-0.09/-0.09 


-0.09/-0.09 


-0.10/-0.10 


CIM 


0.09/0.09 


0.09/0.09 


0.09/0.09 


0.15/0.15 


0.15/0.15 


0.15/0.15 


0.19/0.19 


0.20/0.20 


0.19/0.20 


CIM-R 


0.05/0.04 


0.05/0.04 


0.05/0.04 


0.08/0.07 


0.09/0.07 


0.08/0.07 


0.11/0.09 


0.11/0.09 


0.10/0.08 


ICS 


0.04/0.04 


0.04/0.04 


0.04/0.04 


0.07/0.07 


0.07/0.07 


0.07/0.07 


0.09/0.09 


0.09/0.09 


0.08/0.08 


LOG 


0.03/0.04 


0.03/0.04 


0.03/0.03 


0.04/0.04 


0.04/0.04 


0.03/0.03 


0.04/0.03 


0.03/0.03 


0.00/0.01 


LOG-R 


-0.01/-0.01 


-0.01/-0.01 


-0.01/-0.01 


-0.02/-0.02 


-0.02/-0.03 


-0.03/-0.03 


-0.04/-0.04 


-0.05/-0.05 


-0.06/-0.06 


MOK 


0.05/0.05 


0.05/0.05 


0.05/0.05 


0.09/0.10 


0.09/0.09 


0.09/0.09 


0.12/0.12 


0.12/0.12 


0.11/0.12 


RAS 


0.02/0.02 


0.02/0.02 


0.02/0.02 


0.04/0.04 


0.04/0.04 


0.03/0.04 


0.04/0.05 


0.04/0.05 


0.04/0.05 


RAS-R 


-0.01/-0.01 


-0.01/-0.01 


-0.01/-0.01 


-0.01/-0.01 


-0.01/-0.01 


-0.01/-0.01 


-0.01/-0.02 


-0.02/-0.02 


-0.02/-0.02 


RAI 


0.03/0.03 


0.03/0.03 


0.03/0.03 


0.12/0.10 


0.12/0.10 


0.11/0.10 


0.16/0.15 


0.16/0.15 


0.15/0.15 


WORST 


-0.05/-0.04 


-0.03/-0.03 


-0.02/-0.01 


-0.09/-0.07 


-0.07/-0.05 


-0.04/-0.02 


-0.1 1/-0.09 


-0.09/-0.07 


-0.07/-0.04 


NOIMP 


0.00/-0.00 


-0.00/-0.00 


-0.00/-0.00 


-0.00/-0.00 


-0.00/-0.00 


-0.00/-0.00 


-0.00/0.00 


-0.01/-0.01 


-0.04/-0.04 


LD 


0.00/-0.00 


0.00/-0.00 


-0.00/-0.00 


-0.00/0.00 


-0.00/-0.00 


-0.02/-0.01 


0.00/0.00 


-0.00/-0.00 


-0.01/-0.01 
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Table 3 Bias on the v parameter as a function of the rate of missing data per item (77), the value of the correlation 
coefficient p between the latent variable 0 and the propensity to have missing data c for each method for handling 
missing data (results for w = 0/w = 1) 







IT = 10% 






77 = 20% 






n = 30% 




Method 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


PMS 


-0.04/-0.08 


-0.04/-0.08 


-0.03/-0.07 


-0.04/-0.13 


-0.04/-0.13 


-0.05/-0.14 


-0.05/-0.15 


-0.08/-0.18 


-0.1 1/-0.22 


PMS-R 


0.00/-0.03 


-0.00/-0.03 


-0.00/-0.03 


0.00/-0.06 


-0.00/-0.07 


-0.03/-0.09 


-0.00/-0.09 


-0.03/-0.12 


-0.07A0.16 


IMS 


0.00/0.04 


-0.03/0.02 


-0.05/-0.00 


0.00/0.08 


-0.05/0.04 


-0.12/-0.03 


-0.00/0.13 


-0.09/0.04 


-0.19/-0.06 


IMS-R 


0.00/0.00 


-0.02/-0.02 


-0.04/-0.04 


0.00/-0.00 


-0.03/-0.03 


-0.08/-0.08 


-0.00/0.00 


-0.06/-0.05 


-0.13/-0.13 


CIM 


0.04/0.04 


0.03/0.03 


0.04/0.04 


0.06/0.06 


0.06/0.06 


0.04/0.04 


0.07/0.09 


0.03/0.05 


-0.00/0.00 


CIM-R 


0.02/0.04 


0.01/0.03 


0.01/0.04 


0.03/0.01 


0.01/0.00 


0.01/-0.02 


0.03/0.02 


-0.01/-0.02 


-0.05A0.07 


ICS 


0.00/-0.02 


-0.00/-0.03 


-0.00/-0.03 


0.00/-0.06 


-0.01/-0.05 


-0.03/-0.07 


-0.00/-0.06 


-0.03/-0.10 


-0.07A0.14 


LOG 


0.00/0.03 


-0.02/0.01 


-0.03/-0.01 


0.00/0.05 


-0.04/0.01 


-0.1 1/-0.05 


-0.00/0.09 


-0.08/0.01 


-0.19A0.10 


LOG-R 


0.00/-0.00 


-0.02/-0.01 


-0.03/-0.03 


0.00/-0.00 


-0.03/-0.03 


-0.08/-0.08 


-0.00/0.00 


-0.06/-0.06 


-0. 14/0. 13 


MOK 


-0.01/0.01 


-0.02/0.00 


-0.03/-0.00 


-0.02/0.02 


-0.04/0.01 


-0.08/-0.02 


-0.03/0.05 


-0.08/0.00 


-0.13/-0.05 


RAS 


0.00/-0.06 


-0.00/-0.06 


0.01/-0.06 


0.00/-0.12 


-0.00/-0.1 1 


-0.00/-0.12 


-0.00/-0.17 


-0.01 AO. 18 


-0.02A0.20 


RAS-R 


0.00/-0.04 


-0.01/-0.05 


-0.01/-0.05 


0.00/-0.08 


0.01/-0.09 


-0.03/-0.1 1 


-0.00/-0.12 


-0.03/-0.14 


-0.06/-0.17 


RAI 


0.00/-0.06 


-0.00/-0.06 


0.00/-0.06 


0.00/-0.06 


0.00/-0.06 


-0.00/-0.07 


-0.00/-0.08 


-0.02/-0.1 1 


-0.05A0.15 


WORST 


0.23/0.23 


0.22/0.20 


0.21/0.19 


0.38/0.36 


0.36/0.34 


0.33/0.31 


0.45/0.44 


0.41/0.40 


0.36/0.35 


NOIMP 


0.00/0.00 


-0.01/-0.01 


-0.02/-0.02 


0.00/-0.00 


-0.02/-0.02 


-0.06/-0.06 


-0.00/0.00 


-0.05/-0.04 


-0.11 AO. 11 


LD 


0.00/-0.00 


-0.10/-0.08 


-0.21 AO. 19 


0.00/-0.00 


-0.20/-0.19 


-0.45/-0.42 


-0.00/-0.00 


-0.31/-0.29 


-0.70/-0.67 


FC 


0.00/-0.00 


0.00/-0.00 


-0.00/0.00 


-0.00/-0.00 


-0.00/0.00 


0.00/0.00 


-0.00/0.00 


-0.00/0.00 


0.00/0.00 



MCAR and MAR cases 

Bias is encountered for all methods in the MCAR (w = 0 
and p = 0) and MAR (w = 1, p = 0) cases, but to a differ- 
ent extent. For all the methods and all the studied para- 
meters, the bias increases with n, although for some 
methods, the bias can be small even for high values of n. 



With the exception of IMS and LOG, all the methods 
that do not incorporate a random process (PMS, ICS, 
CIM, MOK, RAS, RAI, WORST) present bias on the 
majority of the parameters (at least 3 among the 6 stu- 
died parameters) in these two cases. IMS presents small 
bias in the MCAR case (only for and PSI), but is 



Table 4 Bias on the variance of the Sj parameters as a function of the rate of missing data per item (77), the value of 
the correlation coefficient p between the latent variable 0 and the propensity to have missing data £ for each 
method for handling missing data (results for w = 0/w = 1) 

7T = 1 0% 7T = 20% TJ = 30% 



Method 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


PMS 


-0.05/-0.07 


-0.05/-0.07 


-0.06/-0.08 


-0.09A0.10 


-0.09/-0.10 


-0.09/-0.1 1 


-0.10/-0.12 


-0.10A0.il 


-0.09/-0.09 


PMS-R 


-0.08/-0.08 


-0.08/-0.08 


-0.08/-0.08 


-0.13A0.13 


-0.13/-0.13 


-0.13/-0.13 


-0.16/-0.15 


-0.15/-0.14 


-0.15A0.13 


IMS 


0.20/0.21 


0.20/0.21 


0.20/0.20 


0.37/0.41 


0.38/0.39 


0.40/0.39 


0.47/0.56 


0.49/0.53 


0.55/0.54 


IMS-R 


-0.03/-0.03 


-0.03/-0.03 


-0.03/-0.03 


-0.05A0.05 


-0.05/-0.05 


-0.04/-0.05 


-0.06/-0.07 


-0.06/-0.07 


-0.03A0.06 


CIM 


0.10/0.13 


0.10/0.13 


0.09/0.12 


0.20/0.27 


0.20/0.25 


0.18/0.23 


0.28/0.39 


0.26/0.36 


0.24/0.31 


CIM-R 


0.03/-0.03 


0.02/-0.03 


0.02/-0.02 


0.05/0.06 


0.05/0.05 


0.03/0.03 


0.07/0.07 


0.06/0.06 


0.06/0.04 


ICS 


-0.06/-0.06 


-0.06/-0.06 


-0.06/-0.06 


-0.10A0.09 


-0.09/-0.09 


-0.10/-0.09 


-0.11 AO. 11 


-0.11 AO. 10 


-0.11 AO. 10 


LOG 


0.16/0.16 


0.16/0.16 


0.16/0.15 


0.28/0.31 


0.29/0.29 


0.30/0.28 


0.38/0.43 


0.40/0.40 


0.44/0.37 


LOG-R 


-O.OOAO.OO 


-O.OOAO.OO 


-O.OOAO.OO 


-0.02A0.01 


-0.01/-0.02 


-0.01/-0.02 


-0.03/-0.03 


-0.02/-0.03 


0.00A0.03 


MOK 


0.16/0.16 


0.16/0.16 


0.16/0.16 


0.30/0.32 


0.31/0.31 


0.31/0.30 


0.40/0.43 


0.41/0.42 


0.43/0.40 


RAS 


-0.22/-0.21 


-0.23/-0.22 


-0.22/-0.21 


-0.34A0.31 


-0.34/-0.31 


-0.34/-0.30 


-0.39/-0.33 


-0.39/-0.32 


-0.40A0.30 


RAS-R 


-0.18/-0.18 


-0.18/-0.17 


-0.18/-0.17 


-0.28A0.26 


-0.28/-0.26 


-0.28/-0.25 


-0.33/-0.30 


-0.32/-0.29 


-0.32A0.27 


RAI 


-0.22/-0.21 


-0.22/-0.21 


-0.22/-0.22 


-0.19A0.23 


-0.19/-0.22 


-0.20/-0.22 


-0.19/-0.21 


-0.19/-0.21 


-0.20A0.19 


WORST 


-0.01/0.08 


-0.01/0.07 


-0.01/0.08 


0.11/0.26 


0.09/0.25 


0.06/0.22 


0.20/0.43 


0.13/0.37 


0.06/0.30 


NOIMP 


0.00/0.00 


0.00/0.00 


0.00/0.00 


0.00/0.00 


0.01/0.01 


0.01/0.00 


0.01/0.00 


0.01/0.00 


0.03/0.00 


LD 


0.00/0.00 


0.02/0.02 


0.09/0.08 


0.00/0.00 


0.09/0.08 


0.42/0.37 


0.01/0.02 


0.21/0.19 


0.98/0.92 


FC 


0.00/0.00 


0.00/0.00 


0.00/0.00 


0.00/0.00 


0.00/0.00 


0.00/0.00 


0.00/0.00 


0.00/0.00 


O.OOAO.OO 
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Table 5 Bias on the cr 2 parameters as a function of the rate of missing data per item (tt), the value of the correlation 
coefficient p between the latent variable 0 and the propensity to have missing data c for each method for handling 
missing data (results for w = 0/w = 1) 







77 = 10% 






77 = 20% 






77 = 30% 




Method 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


PMS 


0.61/0.58 


0.61/0.58 


0.62/0.57 


1.27/1.14 


1.27/1.14 


1.26/1.11 


1 .64/1 .58 


1.59/1.57 


1 .56/1 .50 


PMS-R 


0.34/0.35 


0.34/0.35 


0.34/0.34 


0.62/0.64 


0.62/0.62 


0.61/0.62 


0.85/0.86 


0.82/0.86 


0.79/0.81 


IMS 


-0.1 5/-0.14 


-0.1 5/-0.1 5 


-0.15/-0.16 


-0.24/-0.24 


-0.24/-0.24 


-0.27/-0.27 


-0.28/-0.29 


-0.30/-0.29 


-0.34/-0.32 


IMS-R 


-0.23/-0.22 


-0.23/-0.23 


-0.22/-0.23 


-0.38/-0.38 


-0.37/-0.38 


-0.38/-0.39 


-0.45/-0.47 


-0.46/-0.46 


-0.46/-0.48 


CIM 


0.59/0.57 


0.60/0.58 


0.62/0.58 


1.15/1.09 


1.17/1.13 


1.20/1.14 


1 .54/1 .48 


1.55/1.56 


1.55/1.60 


CIM-R 


0.32/0.28 


0.33/0.29 


0.35/0.30 


0.59/0.50 


0.61/0.52 


0.62/0.52 


0.80/0.64 


0.78/0.66 


0.78/0.65 


ICS 


0.32/0.33 


0.32/0.33 


0.33/0.33 


0.57/0.59 


0.57/0.61 


0.56/0.57 


0.76/0.77 


0.74/0.78 


0.71/0.74 


LOG 


0.19/0.20 


0.19/0.20 


0.18/0.18 


0.22/0.22 


0.22/0.24 


0.16/0.19 


0.18/0.16 


0.13/0.18 


0.02/0.07 


LOG-R 


-0.03/-0.13 


-0.03/-0.03 


-0.03/-0.04 


-0.12/-0.12 


-0.13/-0.12 


-0.1 5/-0.1 5 


-0.20/-0.23 


-0.22/-0.22 


-0.27/-0.29 


MOK 


0.30/0.31 


0.30/0.31 


0.30/0.29 


0.54/0.56 


0.55/0.57 


0.54/0.55 


0.75/0.74 


0.72/0.76 


0.69/0.73 


RAS 


0.26/0.28 


0.26/0.28 


0.28/0.27 


0.49/0.50 


0.49/0.53 


0.49/0.53 


0.65/0.67 


0.62/0.70 


0.60/0.70 


RAS-R 


0.04/0.05 


0.04/0.05 


0.05/0.04 


0.06/0.07 


0.07/0.09 


0.07/0.09 


0.10/0.09 


0.08/0.10 


0.07/0.09 


RAI 


0.29/0.31 


0.29/0.31 


0.30/0.30 


1.12/0.92 


1.12/0.95 


1.11/0.95 


1.53/1.40 


1.49/1.42 


1.46/1.41 


WORST 


-0.25/-0.21 


-0.16/-0.13 


-0.04/-0.03 


-0.42/-0.27 


-0.30/-0.25 


-0.15/-0.11 


-0.50/-0.47 


-0.42/-0.36 


-0.28/-0.14 


NOIMP 


0.01/0.02 


0.01/0.02 


0.02/0.02 


0.00/0.01 


0.01/0.02 


0.00/-0.00 


0.02/0.00 


-0.00/0.01 


-0.01/-0.01 


LD 


0.01/0.02 


0.02/0.02 


-0.00/-0.01 


0.02/0.02 


0.01/0.02 


-0.05/-0.04 


0.03/0.03 


-0.02/0.02 


-0.14/-0.12 


FC 


0.01/0.01 


0.01/0.01 


0.02/0.01 


0.00/0.00 


0.01/0.02 


0.01/0.01 


0.01/0.00 


0.00/0.01 


0.01/0.01 



more biased in the MAR case. This result could be 
expected because IMS is the only imputation method 
(with WORST) that does not incorporate the difficulty 
of the items in the imputation process. 

If the methods using a random process are generally 
better than the similar methods with no random 



process, only LOG(-R), RAS-R, NOIMP and LD present 
few bias on the majority of the parameters in the 
MCAR and MAR cases. For these methods, we note a 
higher rate of rejection of the Rasch model than 
exepcted, a bias on er 3 2 (for LOG(-R) and RAS-R), or on 
the PSI (for LOG and NOIMP). LD is the only method 



Table 6 Rate of rejection of the Rasch model assumption with the Q1 test as a function of the rate of missing data per 
item (7t), the value of the correlation coefficient p between the latent variable 0 and the propensity to have missing 
data c for each method for handling missing data (results for w = 0/w = 1) [*: values significantly different of 5%] 

77 = 1 0% 77 = 20% 77 = 30% 



Method 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


PMS 


5.2/6.5 


5.2/6.6* 


4.9/6.3* 


5.7/9.2* 


5.9/9.7* 


6.2/8.1* 


4.6/22.3* 


6.3/1 7.6* 


6.2/18.5* 


PMS-R 


5.6/5.2 


6.0/5.6 


4.0/4.7 


3.9/8.0* 


4.3/6.4 


6.2/6.0 


5.2/8.5* 


5.9/8.3* 


6.778.2* 


IMS 


5.6/3.9 


5.4/4.6 


6.0/5.8 


4.5/4.6 


6.3/4.4 


6.1/5.3 


3.9/4.5 


3.9/5.0 


6.1/5.8 


IMS-R 


6.0/6.0 


6.0/6.1 


4.5/5.0 


6.2/6.7* 


4.4/5.7 


4.9/5.1 


4.7/7.4* 


5.9/5.5 


4.3/6.8 


CIM 


1 1 .479.7* 


8.579.3* 


9.078.2* 


1 9.671 9.5* 


1 5.571 8.5* 


14.2719.1* 


22.9729.0* 


22.2728.8* 


23.6732.7* 


CIM-R 


5.9/3.7 


4.9/4.4 


5.2/4.4 


6.974.4 


6.975.6 


6.674.5 


6.975.5 


6.8/4.9 


8.076.3 


ICS 


49.6754.8* 


51.5755.2* 


51.7754.0* 


85.0787.9* 


86.0788.6* 


85.0789.0* 


94.1797.6* 


93.7796.5* 


95.6797.7* 


LOG 


33.0737.3* 


34.9734.6* 


35.8737.2* 


64.2763.6* 


63.2767.4* 


61.5765.9* 


65.5765.6* 


63.2768.1* 


60.4766.9* 


LOG-R 


1 0.671 2.6* 


13.1711.1* 


1 1 .971 1 .6* 


19.1719.8* 


19.5718.7* 


1 7.771 9.0* 


22.2726.4* 


24.5724.8* 


22.6724.0* 


MOK 


16.1717.3* 


1 7.971 8.6* 


1 9.271 9.4* 


42.2751.3* 


44.3754.2* 


49.4756.9* 


58.5772.4* 


60.2775.0* 


67.8782.0* 


RAS 


21.8723.8* 


22.8724.3* 


21.5726.1* 


61.2769.9* 


63.9767.6* 


64.2767.5* 


80.6787.7* 


83.3788.7* 


81.0786.3* 


RAS-R 


4.7/4.9 


5.9/5.3 


6.2/6.2 


6.0/6.2 


6.976.9* 


7.376.0 


6.676.6* 


6.771 6.9* 


6.875.7 


RAI 


21.9724.3* 


22.9724.3* 


22.6727.5* 


1 7.8745.0* 


1 9.6740.2* 


1 8.3736.9* 


10.1761.0* 


14.1752.5* 


15.4745.0* 


WORST 


6.875.5 


6.1/5.8* 


5.1/3.2* 


6.776.9* 


6.2/4.4 


5.0/4.8 


1 0.076.9* 


9.275.8 


8.074.0 


NOIMP 


6.5/7.8* 


7.578.2* 


6.977.6* 


9.279.7* 


1 1 .071 0.9* 


1 0.279.4* 


13.3714.8* 


13.8714.0* 


9.679.5* 


LD 


4.9/5.1 


4.0/5.2 


5.1/4.9 


4.8/5.2 


5.1/5.0 


4.3/4.4 


3.473.3* 


3.474.5 


3.372.5* 


FC 


5.2/5.5 


4.8/4.4 


5.1/5.0 


5.4/4.6 


4.1/3.9 


4.9/4.9 


4.0/4.8 


4.7/4.5 


4.9/3.8* 
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Table 7 Bias on the PSI as a function of the rate of missing data per item (tt), the value of the correlation coefficient p 
between the latent variable 0 and the propensity to have missing data c for each method for handling missing data 
(results for w = Q/w = 1) 







IT = 10% 






7T = 20% 






n = 30% 




Method 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


p = 0.0 


p = -0.4 


p = -0.9 


PMS 


0.09/0.09 


0.09/0.09 


0.09/0.10 


0.13/0.13 


0.13/0.13 


0.13/0.13 


0.13/0.09 


0.13/0.09 


0.12/0.09 


PMS-R 


0.06/0.06 


0.06/0.06 


0.06/0.06 


0.08/0.08 


0.08/0.08 


0.08/0.08 


0.07/0.06 


0.07/0.06 


0.06/0.06 


IMS 


-0.05/-0.05 


-0.05/-0.05 


-0.05/-0.05 


-0.09/-0.09 


-0.10/-0.09 


-0.10/-0.10 


-0.14/-0.05 


-0.14/-0.05 


-0.16/-0.05 


IMS-R 


-0.06/-0.06 


-0.06/-0.06 


-0.06/-0.06 


-0.12/-0.12 


-0.12/-0.12 


-0.12/-0.12 


-0.17/-0.06 


-0.17/-0.06 


-0.18/-0.06 


CIM 


0.09/0.08 


0.09/0.09 


0.09/0.09 


0.12/0.12 


0.12/0.12 


0.12/0.12 


0.12/0.08 


0.12/0.09 


0.12/0.09 


CIM-R 


0.05/0.05 


0.05/0.05 


0.06/0.05 


0.07/0.06 


0.08/0.06 


0.08/0.07 


0.06/0.05 


0.06/0.05 


0.06/0.05 


ICS 


0.05/0.05 


0.05/0.05 


0.05/0.05 


0.07/0.07 


0.07/0.07 


0.07/0.07 


0.05/0.05 


0.05/0.05 


0.04/0.05 


LOG 


0.02/0.02 


0.02/0.02 


0.02/0.02 


-0.00/0.00 


-0.00/0.00 


-0.01/-0.01 


-0.05/0.02 


-0.06/0.02 


-0.08/0.02 


LOG-R 


-0.02/-0.02 


-0.02/-0.02 


-0.02/-0.02 


-0.06/-0.06 


-0.06/-0.06 


-0.07/-0.05 


-0.11/-0.02 


-0.1 1/-0.02 


-0.13/-0.02 


MOK 


0.05/0.05 


0.05/0.05 


0.04/0.04 


0.06/0.07 


0.06/0.06 


0.06/0.06 


0.05/0.05 


0.05/0.05 


0.04/0.04 


RAS 


0.05/0.05 


0.05/0.05 


0.05/0.05 


0.07/0.07 


0.07/0.07 


0.07/0.07 


0.05/0.06 


0.05/0.06 


0.04/0.06 


RAS-R 


0.01/0.01 


0.01/0.01 


0.01/0.01 


-0.00/0.00 


0.00/0.00 


0.00/0.00 


-0.03/0.01 


-0.03/0.01 


-0.03/0.01 


RAI 


0.05/0.01 


0.05/0.06 


0.05/0.06 


0.12/0.12 


0.12/0.12 


0.12/0.12 


0.13/0.06 


0.12/0.06 


0.12/0.06 


WORST 


-0.06/-0.06 


-0.04/-0.04 


-0.01/-0.01 


-0.13/-0.12 


-0.09/-0.09 


-0.05/-0.05 


-0.19/-0.06 


-0.16/-0.04 


-0.12/-0.01 


NOIMP 


-0.03/-0.03 


-0.03/-0.03 


-0.03/-0.03 


-0.06/-0.06 


-0.07/-0.06 


-0.07/-0.06 


-0.10/-0.03 


-0.10/-0.03 


-0.1 1/-0.03 


LD 


-0.00/-0.00 


-0.00/-0.00 


-0.01/-0.00 


-0.00/-0.00 


-0.01/-0.01 


-0.03/-0.02 


-0.01 /-0.00 


-0.02/-0.00 


-0.07/-0.01 



that dispays a rate of rejection of the Rasch model 
which is significantly lesser than 5%. This phenomenon 
can be explained by the fact that LD omits all the indivi- 
duals with at least one missing value, and consequently, 
the number of remaining individuals is smaller as com- 
pared to the others methods. As a consequence, the Ql 
test, which is a chi-square type test, might lack power to 
detect small deviations to the Rasch model. 

On the opposite, MOK, CIM and WORST present a 
relevant bias on all the parameters except v in the 
MCAR and MAR cases, and PMS, RAS and RAI are 
very biased methods in the MAR case. 

MNAR cases 

All the methods present several bias in the MNAR case 
(p * 0). For all the methods and all the studied para- 
meters, the bias increases with n, even if for some meth- 
ods, the bias can be negligible even for high values of n. 
Generally, the effect of the p parameter is smaller 
(except for WORST or LD) and can reinforce or reduce 
the bias when p increases in absolute value. 

NOIMP, LOG-R, and RAS-R are the three methods 
that produce the smallest number of biased parameters 
in the MNAR case. Indeed, if the rate of missing value 
is weak (n = 10%), RAS-R is unbiased on all the studied 
parameters, and LOG-R and NOIMP are biased only on 
the rate of rejection of the Rasch model. Neverthless, 
when the rate of missing value is larger than 10%, these 
three methods present bias on the rate of rejection of 
the Rasch model, NOIMP and LOG-R present bias on v 
and PSI, and RAS-R present bias on aj. 



For the methods PMS, IMS, CIM, LOG and RAS, the 
addition of a random process in the imputation pro- 
cess seems to reduce the bias on all the parameters. As 
for the MCAR and MAR cases, LD is the only method 
that produces a too lower rate of rejection of the 
Rasch model than expected, and this could be 
explained by the number of individuals used with this 
method. WORST and RAI produce a systematic rele- 
vant bias on all the studied parameters, and PMS, 
CIM, MOK and RAS displaya relevant bias on 5 of the 
6 studied parameters. 

Discussion 

Sixteen methods for handling missing data have been 
investigated in the framework of psychometric validation 
of a PRO scale using IRT-based methodology. Several 
situations were considered according to the type of 
missing data one might encounter in practice: namely 
MCAR, MAR or MNAR type of missing data. 

Some of the investigated methods can be referred to 
as principled methods, mostly relying on likelihood- 
based analysis, such as Rasch models or on an handling 
of the missing data without imputation, such as NOIMP 
or LD and others as unprincipled or ad-hoc methods 
such as PMS, IMS, CIM or WORST. Some of the latter 
methods (PMS, IMS) are frequently used for missing 
data imputation in HR-QoL scales even though they are 
known to provide biased estimations [28] in cross-sec- 
tional or longitudinal settings. By contrast, the former 
principled methods are likely to be consistent under 
MCAR and sometimes MAR mechanisms. 
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As expected, we observed that principled methods 
such as NOIMP and LD were rarely biased (except 
regarding the Ql test) under MCAR and MAR mechan- 
isms whatever the amount of missing data. By contrast, 
unprincipled methods such as PMS, CIM, ICS, MOK or 
WORST were almost systematically biased even under 
MCAR and MAR mechanisms. More precisely, most of 
the methods taking into account the ability of the indivi- 
duals in the imputation process tend to overestimate the 
psychometric quality of the scale (measured for example 
by the Loevinger's H coefficient or the PSI). This result 
was already noted by Huisman [16] and reflects the fact 
that these methods assume good properties of the scale 
and hence, tend to incorrectly enhance its psychometric 
performance during imputation. 

Moreover, the methods incorporating the ability of the 
individual also overestimated the variance of the latent 
trait (<7 2 ) thus creating artificial heterogeneity between 
the individuals. As a matter of fact, such methods will 
more likely impute a negative (positive) response to a 
patient who's observed score is low (high) and conse- 
quently falsely amplify the distance between individuals 
on the latent trait scale. In most cases, the addition of a 
random process helped to diminish the bias quite 
importantly and should be systematically used when 
possible [29]. 

The impact of the imputation methods in terms of 
bias was usually intensified under MNAR mechanism 
except for NOIMP, LOG-R and RAS-R that displayed 
the most robust results and remained usually unbiased 
(bias, if present, remained rather slight when n <20%). 
However, this time, LD was also affected and displayed 
bias, especially on the v and item difficulties variance 
parameters (er 2 ). Moreover, the type I error of the good- 
ness-of-fit Ql test was underestimated for LD when the 
amount of missingness was high (n = 30%), possibly 
reflecting a loss in power. It is well known that MNAR 
missing data may importantly affect the representative- 
ness of a study sample in relation to the target popula- 
tion. In this study, MNAR missing data were simulated 
such as patients with lower level on the latent trait 
(reduced HR-QoL for instance) had a higher non 
response propensity. The likelihood of missing data 
could also be larger as the item difficulties increased. As 
a consequence, in case of MNAR data, the data suffer 
from sample selection bias: for instance, patients having 
the highest levels on the latent trait primarily remained 
in the study and, under some circumstances, the easiest 
items were more often answered to. This leads to 
usually overestimate the latent trait level (and jointly 
underestimate item difficulties) producing negative bias 
for the v parameter except for the WORST method that 
systematically underestimates the latent trait level by 



only imputing negative responses. A p effect was 
observed for most of the methods on v (except for CIM 
(-R)) and it could sometimes be quite large. This effect, 
reflecting the strong informativity of the missing data, 
generally enlarged the bias that was already observed 
except for the WORST method for which the bias was 
attenuated but still remained. 

Although one could expect poor results using such 
unprincipled or ad-hoc simple imputation methods for 
handling missing data, little was known about the 
impact of using one method or another on the quality 
of questionnaire validation studies. Indeed, missing data 
are solely described in such studies for assessing accept- 
ability of a questionnaire [30,31] and PMS or IMS-based 
methods are often used for imputation. As a matter of 
fact, one of the most commonly used imputation 
method in a wide range of PRO studies (validation or 
clinical research studies), namely PMS, displayed poor 
properties regarding bias on a large number of para- 
meters whatever the studied situation (MCAR, MAR or 
MNAR data) and the amount of missing data. As a con- 
sequence, this method should be avoided because it is 
very likely to overestimate the psychometric qualities of 
scales. Furthermore, PMS might also decrease the power 
of a test aimed at comparing two groups of patients on 
a PRO measure by artificially increasing the variance of 
the latent trait. This is in line with other authors such 
as Chavance [22] who recommends the use of this 
imputation method only if the rate of missing values is 
small (inferior to 5%). Moreover, Fayers et al. [32] gave 
six conditions for using PMS, which are rarely present 
from a practical point of view. 

The methods based on Rasch models without a ran- 
dom process (RAS and RAI) often displayed poor results 
regarding bias on several parameters, especially on the 
variance of the latent trait that was overestimated along 
with the dispersion of the parameters difficulties that 
was underestimated. It was unforeseen that these possi- 
bly attractive methods should in fact be avoided, even 
though it was already noted, but not formally evaluated 
by Sijtsma and van der Ark [17]. 

The analysis without imputation NOIMP is a good 
alternative to simple imputation, provided all the 
responses are used in the analysis, under MCAR, MAR 
and even MNAR data. This result could be expected 
because one of the most important properties of the 
Rasch model is the specific objectivity. This property 
yields that i) all estimated difficulty parameters are inde- 
pendent of the sample used for estimation (item para- 
meter invariance), ii) all latent trait related parameters 
are also independent of the items used for estimation 
(person parameter invariance). Consequently, the esti- 
mations of the parameters are consistent, even with an 
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incomplete dataset, and whatever the type of missing 
data. However some specificities of this study have to be 
mentioned: Loevinger's H coefficient has been computed 
by pairwise technique which consists in using all the 
contingency tables between each pair of items in order 
to compute this indice (the usual procedure consists in 
estimating this indice by listwise deletion). The same 
remark can be made concerning the parameters of the 
Rasch model that have been estimated by marginal max- 
imum likelihood allowing taking into account all 
observed responses. Other methods of estimation (con- 
ditional maximum likelihood for example), omitting the 
individuals with one or several missing data, might end 
to poorer results. 

Our study focused on simple imputation methods that 
are frequently encountered in practice in most studies 
aiming at validating or analysing PRO data. An impor- 
tant issue with such methods is that they will often lead 
to a misleading estimation of precision, which is often 
overestimated. Since our major objective was to high- 
light the strong deleterious impact that these methods 
also have in the framework of studies aiming at validat- 
ing PRO scales, other alternative for handling missing 
data were not evaluated. This is the case of hot deck 
substitution [16,33], imputation based on the Response 
Function Imputation [17], and Two-way imputation 
[17,34]. Moreover, we have not tested multiple imputa- 
tions methods, which are recommended by several 
authors [8,15,17], in order to provide valid inferences 
for statistical estimates from incomplete data and more 
stable results. However, under MCAR or MAR, multiple 
imputations should lead to analyses that are similar to 
likelihood based analyses, being asymptotically equiva- 
lent as the number of imputations increase. 

Conclusion 

This study shows that the choice of the imputation 
method must be made with attention during the valida- 
tion of a scale by a Rasch model in presence of missing 
data. If the missing data are suspected to be MCAR or 
MAR, several principled methods could be used, like 
RAS-R, NOIMP or LD methods. However, if the missing 
data are suspected to be MNAR, RAS-R or NOIMP 
might be preferred (and LD must be avoided), but it 
seems sensible to realize the analysis only if a small 
number of missing data {n = 10%) is present. If the 
number of missing data is too large, none of the meth- 
ods used to handling missing data seems to produce 
accurate results on the majority of the parameters, and 
consequently, all the analyses might be biased. One can 
also stress that all the methods not including a random 
process, in particular PMS (that is the most popular 
method), should be disregarded. 



Finally, the impact of the choice of an imputation 
method on the statistical properties of tests aimed at 
comparing PRO data from two groups of patients is also 
an important topic for future research and deserves 
investigation. 
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